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We consider the problem of searching for a node on a labelled random graph according to a greedy 
algorithm that selects a route to the desired node using metric information on the graph. Motivated 
by peer-to-peer networks two types of random graph are proposed with properties particularly 
amenable to this kind of algorithm. We derive equations for the probability that the search is 
successful and also study the number of hops required to find both numerical and analytic evidence 
of a transition as the number of links is varied. 



I. INTRODUCTION 

More than a decade ago, small world graphs were proposed to model networks found in nature and this subse- 
quently led to an explosion of interest in the topic. The shortest path between two nodes in a small world graph is 
typically no more than ^ log N where N is the size of the graph. However, although graphs with these short paths 
can be constructed, the way in which the path or route to a specified destination node is discovered is another matter. 
Typical approaches are based on recursively flooding enquires to all neighbours on the graph and this kind of global 
propagation is the basis of the shortest path algorithms such as Dijkstra or Bellman-Ford. These approaches suffer 
from a consequent lack of scalability and indeed routing on the internet uses a hierarchical approach to counteract 
the problem. Modern peer-to-peer protocols pursue a different approach to avoid the scaling difficulty by having a 
particular network structure that allows a small number of select queries to efficiently find the required destination. 

This paper considers the problem of how to find the shortest path to a given labelled node on a random graph 
without relying on global propagation. Our approach to the problem relies on two essential ingredients: a simple rule 
acting on local information at each node and an appropriately chosen random graph structure. The complication of 
the peer-to-peer systems which motivate the study are discarded to choose these ingredients in the simplest way that 
exposes the problem. Firstly we assume that the routing rule greedily attempts to get as close as possible to the 
destination at each step. For this to work, the graphs we consider must have a small world like structure, that is, 
they have short characteristic path length but tend to cluster nodes with similar labels together. Moreover, in order 
to simplify the analysis by allowing local information to uniquely specify which link to take at intermediate steps, we 
require that the graphs satisfy a strong version of the triangle inequality determining, not merely bounding, the third 
side of a triangle in terms of the other two. These requirements are not satisfied by the random relinking construction 
of Watts and Strogatz T] , and instead our random graphs are based on a modification the traditional construction of 
Erdos and Renyi [2\ that selects links to delete from the fully connected graph according to a probability depending 
on their metric weight. To encourage clustering of nearby nodes we only consider probability distributions that favour 
short links. 

Having mentioned peer-to-peer systems as a motivation, it may be helpful to briefly review aspects of modern 
distributed hash tables to clarify the relationship to our work. These systems store data in a distributed setting by 
associating the data with an integer key (the hash), and place the data on a computer node with integer nodelD 
close in value to the key. The essential function that these systems provide is to allow any node to efficiently retrieve 
the data and this becomes the ability to locate a particular node according to its nodelD. These nodes abide on a 
computer network that allows packets to be sent to any node given its computer address. However the address and 
the nodelD are distinct entities and nodes only store a table of addresses corresponding to a small fraction of the 
total number of nodes. A node, and the data stored on it, is located by a series of queries to other nodes that return 
the addresses from their own tables that are closest to the desired node. The queries are first sent to the nodes in 
the local table that arc closest to the nodelD being sought, and then recursively to the addresses returned from the 
queries. This process can involve multiple queries at each stage and in the event that a query fails to result in a closer 
node address, other queries are attempted. Through these mechanisms nodes are located successfully and efficiently 
with overwhelming probability. 

Two well studied peer-to-peer systems are Chord and Kademlia [4] and the metrics they use to determine the 
closeness of nodelD's, distance around a circle and XOR respectively, are exactly those used as the metrics on the 
random graphs in this paper. Nodes on peer-to-peer systems use the metric to organise their table of nodelD addresses 
to contain more addresses of nearby nodelD's than distant nodelD's. The random graphs in our work can be thought 
of as being determined by the connectivity implied by these tables, and will have clustering as a consequence. The 
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degree of a node in the random graph should therefore be determined by the size of the table in the peer-to-peer 
system. Although this table size appears to be configurable, it must in fact grow according to the log of the maximum 
number of nodelD's and the degree of a node in the random graph also grows as logiV. However, peer-to-peer systems 
are dynamic, with nodes continually entering and leaving the system so the nodelD's are sparse and the local tables 
themselves are frequently updated. This dynamic aspect is not part of our work where we take the nodes to be 
labelled contiguously and consider static graphs with randomly generated connectivity. The series of queries that 
occur in a node lookup in a peer-to-peer system is replaced in our work by the stepwise deterministic construction 
of a path along neighbouring links of the random graph towards the desired node. Crucially, in our work we use a 
greedy algorithm and at each step discard all routes that are not the best, so the path can arrive at a dead end where 
no neighbour is closer to the final destination. In this case the search has failed. 

In summary, the context within which we study this problem is that of a simple greedy algorithm and a rather 
complex random graph structure. While peer-to-peer systems motivate this context, the algorithm used is different and 
their overlay networks are certainly not random graphs. Moreover, in our analysis we are concerned with behaviour 
in the large graph limit. We consider the probability of successful search and in particular its asymptotic value as 
the sought node becomes further and further away. We also measure statistics for the number of steps in a successful 
path. The most interesting observation is that there appears to be a transition in the probability of success as the 
average connectivity is varied. This is quite distinct from the percolation transition as all the random graphs we 
consider will consist of a single connected component. 

Subsequent sections will discuss the construction of the random graphs in detail and study some of their properties. 
The greedy routing algorithm is then introduced and basic equations used to analyse the algorithm are derived. The 
analysis itself constitutes the main section and considerable attention is paid to one model that can be solved exactly. 
However, although this solution provides clues that are used for approximate analysis, we rely heavily on numerical 
solution of the equations and also check our results against simulation of the whole system. 

II. GRAPHS 

We shall consider graphs constructed in a manner similar to traditional Erdos and Renyi random graphs [5] (as 
opposed to the configuration approach of Molloy and Reed[5]), by diluting the links of a fully connected metric graph. 
First imagine a fully connected graph on N nodes labelled 0,l,2...iV— 1 in which the link between nodes labelled a 
and b has length dab according to the graphs's metric. Links will be selected to appear in the random graph according 
to a length dependent probability distribution. In order that the graph be uniform, in the sense that all nodes are 
statistically equivalent, the set of link lengths emerging from any node in the fully connected graph must be the same, 
and for the graphs we consider will take the discrete values 1,2,3...A^ — 1. 

We shall imagine that each node knows the labels of the neighbouring nodes to which it is directly connected, and 
nothing else. Specifically it has no knowledge of who its neighbours are connected to. This avoids undue complexity 
such as routing updates and complex forwarding tables at each node and radically simplifies the dynamic nature of 
peer-to-peer networks. In order that this local information is sufficient to allow the greedy algorithm to determine 
which link to use to get closest to the eventual goal, we consider metrics having the property that the length of the 
third side of a triangle is completely determined by the lengths of the other two sides. Two metrics with the necessary 
properties are the circle and XOR metrics. 

Circle metric. The length of a link from node labelled a to & is: 

dab = ~ a)modN (1) 

This is simply interpreted as the one way distance around a circle as shown in figure [T] The metric is not symmetric 
and the resulting graph is directed. A triangle with sides length i and j with i > j has the third side directed from 
the endpoint of j and is of length dji — i ~ j. Where the metric function is now applied to lengths not node indices. 
In the pcer-to-peer context, this metric is used in the Chord approach [3]. 

XOR metric. The length of a link between node labelled a and h is: 

dab = a®b (2) 

Where ffi represents the bitwise XOR of the integer arguments. In this case N must be taken to be a power of 2 to 
preserve the uniformity of the graph. It is possible to interpret this as the Manhattan distance when the nodes are 
placed on a lattice as is shown for a simple case of 8 nodes in figure [2] A consequence of this metric is that a triangle 
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with sides length i and j has the third side length dij = i® j. In the peer-to-peer context, this metric is used in the 
Kademlia approach [3]. 

The common structure of the graphs we consider arises from insisting that the probability distribution function 
p{d) that selects for the existence of links of given distance is a strictly decreasing function of distance. At least for 
the case of the circle metric, this intuitively gives rise to many short links along the perimeter of the circle and fewer 
longer links as chords across the circle along the lines of the small world proposal of Watts and Strogatz[T|. We expect 
significant clustering as a consequence. The picture is less clear for the XOR metric, but again we anticipate some 
kind of small-world like structure to arise. Such structure is desirable for a greedy routing algorithm as discussed in 
the Introduction. 

Preliminary investigations indicated that exponential decay of the probability distribution, even when appropriately 
scaled, does not allow enough long links for efficient routing and we have concentrated effort on power law distributions: 

Pid) - ^ (3) 

The bulk of analysis will be for the case a = 1 but to motivate this choice we retain it as a parameter for the present. 
Were a to be zero, the graph would be an Erdos and Renyi random graph with average degree zN, and in general, z 
has a related meaning that scales the average degree. To bound the probability of length 1 links we take < z < 1, 
though it would be possible to consider z > 1 provided short links are automatically present and the probabilistic link 
selection only applies to longer links. For example, with the circle metric, z = 1 causes all nearest neighbours to be 
connected and routability is guaranteed. This is not the case for the XOR metric which requires some longer links in 
order to create a giant component. In contrast to the case of Erdos and Renyi random graphs, z does not scale with 
N (as 1/N in that case), and consequently the average degree depends on iV. 



A. Graph Properties 

Before proceeding to consider routing let us briefly characterise the a and z parameter regimes according to some 
standard graph properties. We employ the techniques described in [S] and use the following generating function for 
the probability of vertex degrees. 

As follows from the uniform property of the graphs without the need to specify which metric is used, though in the 
case of the directed circle metric, this only counts either the in or out degrees. 

The moments of the degree distribution are computed by taking derivatives of the generating function. For example 
the first few central moments are: 

N-l 

(k) = G'oil) - ^ E ^ = zH^-i,c. 

i=l 

{{k-{k)f) = zHn-I^u - Z^HN-l,2a 

((fc-(fc))3) = zHN-Ua~iz^HN-l,2c. + 2z'^HN-1^3o. 
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Where Hn^m is the generahsed Harmonic number that in the hmit n —>■ oo becomes a Riemann Zeta function C,{m). 
Notice that for a < 1 the average degree grows as N^~°'. The shape of the degree distribution resembles that of a 
Poisson law, but is narrower. Even for large graphs with a < 1 where some of the terms in the expressions above may 
be dropped, the moments do not match those of a Poisson distribution. 

The cluster coefficient Ll_ is defined as the ratio of the number of triangles to connected triples, iN/s/N^. The 
number of connected triples can be obtained from the vertex degree generating function as: 

m - yG^,'(l) ^ — {{H^-La f - H^-l,2a) (5) 

Using the the uniqueness of lengths and the property of the metric that determines the length of the third side of 
a triangle in terms of the other two, it is straightforward to compute the probability of selecting links that form a 
triangle and consequently the number of triangles. 

N-li-l ^ 

The cluster coefficient is always proportional to z, but the sum above is unwieldy for general a and its value depends 
on whether the graph is based on the circle or XOR metric. For a > 1 the coefficient has a finite limit for large N 
graphs but this is not generally the case for a < 1. At a = 1 the sum becomes more tractable and for the XOR metric 
we find: 

3A^A ^ 2z 1 , . 

N, (if^_i)2_^(2) ^J■(^©J) ^) 

The unusual sum appearing in this formula is discussed in the appendix and it approaches a finite limit of approx- 
imately 1.54 in the large N limit. Clustering vanishes slowly as 1/log^ for large N and simulations confirm the 
form derived above. 

In the case of the circle metric more care is needed to take account of the directed nature of the links. This gives 
rise to some changes in factors but ends in the similar looking formula: 

N—l 

3A^A _ 3z 1 

N, ~ 2(i^^^_l)2 - C(2) ^ ^J{J - ^) ^ ' 

Here the limiting value of the sum can be expressed as 2C(3), approximately 2*1.202057. 

As a large connected component is a precondition for successfully finding nodes, we want to ensure that the network 
is above any possible percolation threshold. For a < 1 the average degree grows with A^ and in contrast to Erdos and 
Renyi random graphs, we expect percolation to occur for any value of 2; This is indeed observed in simulation 
but there is no simple proof since the generating function techniques of [B] cannot be relied upon for this purpose 
as the existence of links is distance dependent. We have just seen that for a > 1 there is finite clustering in the 
thermodynamic limit and in this range there may be a complex percolation threshold in the a, z plane. In the case 
of the circle metric at a < 1 a rough test is to compute the probability that a gap with no links crossing it exists in 
the circle. This probability goes to zero for any value of z, but still there may be finite size effects at small z. 

In summary: 

a > 1, (fc) does not grow with A^ and there is finite triangle clustering. There may not be a giant component, 
a < 1, (fc) is large and grows with A^, but the triangle clustering coefficient decreases to zero. 

As examples, we have simulated inverse square and inverse square root laws and find the typical behaviour described 
above. Each regime has disadvantages. To be certain of a giant component we should avoid a > 1, but in the a < 1 
regime (fc) grows. To understand why this second issue is a problem we must start to consider routing issues. The 
average size of the table needed to keep track of neighbouring nodes is the mean number of links leaving a node, 
that is, (fc). For efficiency, the amount of information stored on a node should not grow too quickly with A^. The 
parameter regime with a < 1 is therefore less appealing. 

The most interesting regime occurs at a = 1 and for the rest of this paper we work at this point. This leads to slow 
logarithmic growth of the routing table, but potentially successful search. Moreover, this is precisely the situation 
motivated by Kademlia which also has a table size that grows logarithmically. Although we expect a giant component 
at Of = 1 and see one in simulations, we should beware of potential finite size effects when the parameter z is small. 
Clustering does go to zero for large graphs, but it does so as 1/log^ A^ which is much slower than the l/N expected 
for Erdos and Renyi random graphs. In fact, it is the clustering of nodes with nearby labels that is important in this 
work rather than the global triangle cluster coefficient, so it may still be legitimate to term these graphs small world. 
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FIG. 3: Routing from node a to node b via node a. For the greedy algorithm to take this route there must be no link to any 
node Cj closer to the destination than d. 



III. GREEDY ROUTING 



For the graphs we consider, the only information the greedy routing algorithm requires at a node is the list of 
nearest neighbours. This list contains the node indices of the neighbours, and from this information, the special 
distance metrics we have chosen allow computation of the distance to that node and moreover the remaining distance 
from the neighbouring node to the final destination. 

At each step the greedy algorithm chooses to hop to the neighbouring node that is the closest to the eventual 
destination. If there is no neighbour closer than the present node, then routing has failed according to this mechanism. 
No backtracking is allowed. Note that the length of the hop to the neighbour is not directly relevant to this algorithm 
except though triangle bounds. 

To analyse this algorithm consider the probability of successfully reaching the destination at distance d in k hops, 
q{d, k). Of course the number of hops is limited by the distance k < d as each hop must get closer to the goal. For 
the case of a single hop, q{d, 1) is merely the probability that a direct link exists. 

q(d,l)=p(d) (9) 

The probability of success in more than one hop can be computed iteratively. Consider figure |3] for routing from node 
a to node b via node 

q{d,k+l)=Y, PidacMh k)l[{l- P{dac, )) (10) 
i=k j=0 

Where node labels and distances are as in the figure. The first terms are self evident and the product accounts for 
the greediness of the algorithm by ensuring that there is no neighbour cj of a closer to b than Ci . Note that by virtue 
of the greediness of the algorithm, q{d, k) is independent of N except that it vanishes for d > N. 

Using the triangle property of the metrics and specialising to the a = 1 probability distribution, this can be written 
explicitly in terms of the metric function for distances as: 

d-l i-l , 

Z , . . s TT / Z 



i—k j—Q 



'^K^+i)-E^'^(^^)n(i-^) (11) 



To illuminate the routing algorithm and the recursion relation it is helpful to consider = 4. We shall do this for 
the XOR metric and the reader should have in mind an image of the graph similar to that in figure [2] The resulting 
probabilities are shown in table ij As an example of a two hop path consider q{3, 2). Starting from node there are 
two paths to the destination node 3: 013 and 023 and in both cases there is a factor 1 — z/3 to ensure that there is no 
direct path. The preferred path would be 023 as the intermediate point is closer to the destination: the contribution 
from this path is {z/2)q{l, 1)(1 — z/3). For the path 013, there is an additional factor to exclude the possibility that 
the preferred path 023 exists and the contribution is zq{2, 1)(1 — z/2){l — z/3). For the three hop case g(3, 3), notice 
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TABLE L Values of g(d, fc) for XOR metric for small values of d 



that only one path is possible: 0123 not 0213, as the step from 2 to 1 would move further from the goal. The z 
dependence in the table agrees with numerical simulations. For the circle metric, the table is similar but the factors 
are not identical. 

It is convenient to define a generating function for the probabilities q{d, k). 

d 

g(d,x) = 5]g(d,fc)x^-i (12) 

k=l 

We shall use the recurrence relation that this obeys in analytic work, but numerically it is more appropriate to directly 
consider quantities measurable in simulation. 

By summing over all possible numbers of hops we obtain the overall probability r[d) of routing over a distance d. 

d 

r(d) = ^g(d,fc) = Q(d,l) (13) 

k = l 

This quantity also obeys a recursion relation following from the relation for q{d, k). 

r(l) = z (14) 

and 

In general r{d) is a polynomial in z of degree d{d — l)/2. For very small d, the explicit form can be deduced by 
summing entries from table |T] 




IV. ROUTABILITY 



The recursion relations are complex, with each new value depending on all previous ones. We therefore rely heavily 
on numerical results, though we can throw some analytic light on the system with the circle metric especially at 
z = 1. To numerically solve the recursion relations for r{d), values of d up to 10^ are accessible in reasonable time on 
a desktop computer; while for q{d, k) we can only approach d ~ 10"* with similar effort. We have also checked these 
results and have investigated other properties by directly simulating samples of the graphs and running the greedy 
routing algorithm on them. The simulations allow us to investigate properties such as the size of the giant component, 
the triangle cluster coefficient besides the probability of success and number of hops required for greedy routing. In 
reasonable time, graphs of sizes up to 32000 can be studied with several hundred samples. 



A. Circle Metric 



For the circle metric the recursion relation for r{d) becomes 



r{d) =|+n=i3^Kon;=o(i-3^j (16) 

_z I zT(d+l-z) sr^d-l f, -x T(i) -n 
— d^ T(d+1) Z^i=l '■)T{i+l-z) V-L'J 

Figure [4] shows r{d) for some representative values of z. For z = 1 routability is guaranteed since neighbours are 
connected around the circle. For values of z in the approximate range z > 0.3, the form r{d) = r + ad~^ with r and 
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FIG. 4: The routing probability r{d) on the circle metric. EYom top to bottom at values oi z = 0.7, 0.5, 0.3. 




FIG. 5: The asymptotic constant value of r{d) with the circle metric. Data points obtained by numerically fitting to the form 
described in the text using the last 5 10'' points of a solution of the recursion relations to d = 10^. Error bars are too small to 
be visible. Curve is the result of second order perturbation theory about the point z = 1. 



a constant provides a very good fit especially at larger d. Indeed, second order corrections of the form d~^^ can also 
be accurately identified. This form is supported by analysis of the recursion relation. If constant r{d) = r is inserted 
on the right hand side, the sum over gamma functions can be performed: 



z , zT{d+l - z) 
d ' 



zT{i) 



z(l - r) 
= r+ ^ ' -r 
d 



T{d- 



z) 



r(d + i)r(i-^) 



(18) 



Note that part of the \/d term is cancelled and that the remaining ratio of gamma functions is indeed of order d^^ at 
large d. Unfortunately this still does not allow us to obtain an analytic expression for r because higher order terms 
are needed that rely on inserting the full (not asymptotic) form of the corrections to r{d) in the sum. We shall return 
to a more careful asymptotic analysis below. 

Simpler equations are obtained by perturbing away from the known behaviour at z = and z = 1. An expansion 
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of r{d) is inserted into the recursion relation and although the resulting equations for the coefficients at each order 
still depend on all coefficients at smaller d as is the case for the full equation, the structure is sufficiently simple to 
allow solution. 

To second order in 1 — z we write: 

r{d) = 1 - (1 - z)n{d) - (1 - zfr^id) (19) 

Then the equation for ri(d) becomes: 

^i(^) = J + ^E^i« (20) 

i=l 

With solution ri{d) — 1 independent of d. The equation for the second order coefficient is: 

i=l 1=1 ^ ^ 

This has the solution r2{d) — 1 — The finite size correction d^^' seen numerically and included in the asymptotic 
analysis is expected to appear as a log at third order. 

Combining these terms and taking the large d limit to find the asymptotic constant part of r(d): 

r = l-(l-z)-(l-z)2 (22) 

In figure [5] this curve is shown alongside the numerically determined value of the asymptotic constant r and is a good 
match for larger values of z The value of r vanishes below a critical value of Zc — {3 ~ a/5)/2 — 0.38197. This is in 
the correct vicinity of a numerical transition, and provides support for the existence of this transition. 

The equations for the perturbative expansion around z — have simpler structure, no longer involving all coefficients 
at smaller values of d, but the terms appearing in the equations have more complicated analytic form. For small values 
of z the perturbative series to second order is: 

r{d)^lz+^-^z' (23) 

Where each term separately vanishes in the large d limit. However, the higher order terms which are expected to be 
of the form {log d)"^^z"/d, do so slowly and examples below will provide a warning that this kind of series can easily 
sum to a constant. 

The numerical data in figure [5] is only able to indicate that the asymptotic constant r becomes very small for z 
below the putative transition. We would like to investigate the asymptotic properties of the equations more closely 
in order to gain more evidence for a transition. As a model for this we start by considering the special value z ~ 1 
where the equations simplify to the extent that progress can be made. It is straightforward to check that r(d) = 1 is 
a solution of the recursion relations without any need to take limits. But to study this in more detail we look at the 
probabilities q{d, k) which include the hop information. On the circle the recursion relation becomes: 

i—k 

d-1 

^E9(*'^) (24) 

i—k 

And we can obtain the first few values immediately: 

q{d,l) = ^ 

q{d,2) = -Ha-i 
d 

1 



q{dA) = :^{{Hd-i)^-3Hd-iHd-i,2 + 2Hd-i,3) (25) 
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Where the Hn^m are generahsed Harmonic numbers. 
For a general solution it is better to return to the form: 



d-l 



,id,k)^l E ' (26) 

■ 1=111^=1*^ 



ik-l>ik-2--->i2>il=i -l-lj^l 3 

Then by an exercise in combinatorics [T the generating function for the g(rf, k) is 



Q{d,x) = ^q{d,k)x 

k=l 

d-l 



fc-1 



d 



1=1 

1 [ ^Hd-iA-^f 
- exp ' ^ 



E 

\ k=l / 

^ (27) 

d^-^r{i + x) ^ ' 

The generating function obeys the relation {d + l)Q{d + l,x) ~ (d + x)Q{d,x). By expanding, the general form of 



k) along the lines of ( 25 ) can be written as a sum over partitions. A recurrence relation for q{d, k) in terms of 



the generalised Harmonic numbers also follows. 



To recover the z = 1 result that r(d) = 1 by summing q{d,k) according to formula (13), some care is needed 



Although the correct result is obtained by simply summing the leading asymptotic term log ^ d/d{k — 1)!, there is 
no reason not to expect subleading terms to also contribute. We proceed using the generating function: 

r{d) = Q{d,l) 

1 / ^lU-iAziyi] 



•i-- 1 , ^C(fc)(-i) 



k 



^exp 7 + logrf-E k I 

\ k=2 / 

= 1 (28) 

Where the result follows at finite d from interchanging the order of sums on the second line, but we have proceeded 
to the limit using an identity relating the alternating sum of zeta functions to the Euler Mascheroni constant 7. 
This approach, still at z = 1, can be extended to compute the expectation value for the number of hops. 

d 

r{d) 

= 1 + Q'(d,l) 

00 

= l-Y,Ha-i,k{-lf 

fe=i 

= Hd 

7 + logd (29) 

This form is confirmed numerically and acts as a check of the numerical implementation. 

Using intuition gained from the solution of the z = 1 case, we wish to perform a similar analysis for z < 1. The 
analysis is based on the recurrence for the generating function: 

i—1 
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We search for an asymptotic solution resembling the last line of (271 of the form 



Qid,x) = ^ + j^ + .-. (31) 
Where the coefhcients Ci and the exponent /3 are functions of z and x. We insert this expansion on the right hand 



side of (30) relying on the fact that a similar procedure works at z = 1. Then using Euler Maclaurin to estimate 
the sum in terms of an integral and matching powers of d we find an equation for the exponent, a series of equations 
relating coeSicients, and from the l/d term, a normalisation equation for the coefficients that involves them all. Only 
the equation for the exponent can be solved in isolation: 

r(/3 + z) 



m 



xr(l + z) (32) 



This correctly reproduces (3 = x at z = I and from graphical considerations it is clear that there is a unique solution 
l3{x, z) for all values of z and x in their range. At x = 1 we find /3 = 1 for all values of z, so the leading term in 
r{d) = Q{d, 1) is constant and the exponent alone is unable to indicate a transition. Fortunately, the expectation 
value for the mean number of hops does provide a way of accessing the value of /3, or at least its derivative: 

{k) = l + ^^-l+"^+P'^ogd (33) 

Where the derivatives are with respect to x and all terms are evaluated at a; = 1. The coefficient of the log is given 
by: 

P' = —, ^ (34) 

Where ?/; is the dilogarithm. 




FIG. 6: The coefficients a (left) and b (right) of a logarithmic fit (k) = a + blogd using the last 5000 points of a numerical 
solution on the circle metric to ci = 10^. Error bars are too small to be visible. The curve in the figure for b arises from the 
asymptotic analysis discussed in the text. 



Numerically, a logarithmic form is a good fit to the expected number of hops and indeed at z = 1, (29 1 is exact and 
was used as a check of the computer implementation of the recursion relation for q(d, k). For all z < 1 the numerical 
curves of (k) are accurately fitted by a + b\og{d) and the coefficients a and b are shown as functions of z in figure 
[6j The prediction of ( 34 1 is shown in the second of these figures and is accurate for values of z above the transition. 



Below that point, the prediction continues to grow, but the data reverses its trend. This is presumably a finite size 
effect as in the thermodynamic limit the probability of successful routing vanishes below the transition and (k) cannot 
be defined. 

Returning to the asymptotic analysis of ( 30 ) , the normalisation equations for Ci involve all the coefficients and 
we cannot solve them to obtain any expression for r = ci. However, the ratio C2/C1 of the sub-leading to leading 
coefficients can be computed. Again this yields favourable comparison with numerical results in the region above 
the transition. However, in neither of the cases where it has been tested, has the asymptotic analysis indicated the 
existence of a transition. Based on the clear disagreement with the numerical results and the evident difficulty in 



separating the leading and subleading terms in (31 1 for small z we conclude that our asymptotic analysis fails in this 



region and is unable to provide information about the transition. 
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FIG. 7: The routing probability r{d) on the XOR metric a.t z = 0.5. 



XOR Metric 



For the XOR metric the recursion relations are less amenable to analytic work even at the special value z = 1. 
Numerical solution of the following recurrence for r{d) is shown in figure [7j 



1 - 



(35) 



The large fluctuations in r{d) as d increments by small amounts are characteristic of all results with this metric and 
it is therefore not appropriate to fit the whole curve. We have selected points at d = 2™ and at d = 2™ — 1 which 
appear to bound r{d) from below and above respectively and provide smooth curves that can be fitted. The fitting 
procedure itself differs from that used on the circle, not least because there is no reason to expect the exponent to 
take the value z. In order to avoid numerical instability, we have chosen to fit to the form + b^/d"^ where positivity 
of the coefficients is enforced and the exponent c is also fitted. Since the selected data points are spaced exponentially 
the choice of which points to use in the fit is also different, and based on runs up to d = 10^ all points above d — 1000 
have been chosen. 



FIG. 8: Results of a numerical fit to r(d) based on the fitting procedure described in the text. The left hand plot shows the 
asymptotic constant part r = a? and the right hand plot shows the exponent of the finite size corrections. On the left hand 
the upper points are a fit for d = 2™ — 1 and the lower points are for d — 2"^ while this identification is reversed for the right 
hand plot. 

The results of these fits for the constant part and the exponent are shown in figure [8] Notice that the routing success 
is below one even for z = 1 and that the exponent certainly does not behave as z so the situation is rather different 
from that for the circle metric. The exponent takes a similar value for both the bounds, and we might anticipate that 
there is a common exponent to describe all values of d. The asymptotic constant r — takes distinct values for each 
bound but both curves appear to converge to indicate a transition in the same vicinity. 
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Although the XOR metric leads to substantially more fluctuation in the mean number of hops (k) than the circle 
metric, the overall trend remains logarithmic. In the two figures|9]the coefficients a and 6 of a fit to the form a+b\og{d) 
are shown. These curves are less abrupt than for the circle metric but follow the same trend. Notice in particular 
that the average number of hops required by the XOR metric is less than that needed by the circle metric when 
searching for nodes an equal distance away and moreover that it retains a much more stable value in the region above 
the transition. 




FIG. 9: The coefficients a (left) and b (right) of a logarithmic fit to (k) as obtained by numerically fitting to the form described 
in the text using the last 5000 points of a run to d = 10**. Error bars are shown. 



V. CONCLUSION 

We have considered the problem of efficiently finding a labelled node on a random graph using a simple greedy 
algorithm that makes decisions of which link to take solely on the basis of the list of labels of neighbouring nodes to 
which it is directly connected. To enable this, the random graph was constrained to have small world like structure, 
and we proposed and constructed a class of graphs with properties intended to facilitate the search. 

In the limit of very large size graphs we have demonstrated that for an appropriate range of parameters, the search 
for a node arbitrarily far away has a finite probability of success. Moreover, we have strong numerical hints that the 
system displays a transition between the regime with finite probability of locating a desired node at large 2; to a regime 
where this is not guaranteed at smaller z. Unfortunately, the log A'' dependence of many quantities makes it hard for 
numerical work to accurately predict behaviour in the thermodynamic limit. The existence of the transition for the 
circle metric has some support from analytic analysis, with both the perturbative and asymptotic analysis matching 
numerical results at larger z, and with the perturbative approach indeed predicting a transition at finite z. For the 
XOR metric where analytic work is much harder, the graphs obtained from numerical solution of the equations have 
similar shape to those for the circle metric, and we expect the same conclusion. 

In our work it seems that the random graphs based on XOR metric are less likely to lead to successful routing 
than those based on the circle. The reason for this observation is that while the sum of the hop distances is equal to 
the total distance to the end node on the circle, for the XOR metric it is greater, and consequently the probability 
of such links existing is smaller. On the other hand, the number of hops needed to reach the destination is typically 
fewer for XOR than circle metric. However, Kademlia has generally been favoured over Chord in existing present 
day peer-to-peer systems. This merely emphasises the difference in approach to our work as peer-to-peer systems 
are dynamic and the list of stored links changes in order to optimise routing which is eventually always successful. 
The XOR metric is symmetric so the results of any query can be used to update the local table, thus minimising the 
number of queries needed. 
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Appendix A: Sums involving XOR 



Here we examine some unusual sums that involve the XOR operator. The precise value of these sums fluctuates 
within an envelope, so it is the bounds that are studied. 



First it is helpful to investigate. 



ji-i 

^iH-Ettt^ (A1) 



For n = 2™ — 1 and for the range of i appearing in the sum, it becomes apparent that n©i = n — i by considering 
the binary form of n. In this case, 5*1 = 27?„_i/n and this constitutes the lower bound. 

For n = 2"* similar considerations lead to n(Bi = n + i and in this case. Si = 2iJ„_i/n — H2n-i/n + 1/n" which 
acts as an upper bound. 

By taking the n — > oo limit we find that 

log" ^ a ( \ ^ 2\ogn 

< biin) < (A2) 

n n 

Now consider the more elaborate sum that appears in the expression for the triangle clustering coefficient. 

n— 1 i— 1 -, n— 1 



_1 _^Si{i 

Z-^ Z-^ 

Note that this obeys the recurrence relation: 



S2{n) = yy —^,^y^ (as) 



S2{n+l)^S2{n) + ^^ (A4) 
n 

This allows us to determine the form of the finite size corrections to the asymptotic constant. 

5*2(71) const ^ — (A5) 

n 



The bounds on Si (n) in ( A2 1 translate into bounds on the parameter &, but the main purpose of this exercise is to 
justify the form of corrections and thus allow accurate numerical determination of the asymptotic constant. This 
is necessary because in contrast to the circle case where the recurrence relation can be used the full sum must be 
performed in numerical work. This restricts the sizes accessible. 

For the circle metric, the sum in equation Q has the same type of corrections to the asymptotic form, but does 
not suffer from the variation indued by XOR. 
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